AITopics | feasible policy

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

Neural Information Processing SystemsFeb-16-2026, 10:54:39 GMT

We name this problem Safe-RL-SW . Our step-wise violation constraint differs from prior expected violation constraint (Wachi & Sui, 2020; Efroni et al., 2020b; Kalagarla et al., 2021) in two aspects: (i) Minimizing the step-wise violation enables the agent to learn an optimal policy that avoids unsafe regions deterministically,

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
Asia > China (0.04)

Genre: Workflow (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

Neural Information Processing SystemsFeb-16-2026, 10:54:35 GMT

We name this problem Safe-RL-SW . Our step-wise violation constraint differs from prior expected violation constraint (Wachi & Sui, 2020; Efroni et al., 2020b; Kalagarla et al., 2021) in two aspects: (i) Minimizing the step-wise violation enables the agent to learn an optimal policy that avoids unsafe regions deterministically,

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)

Add feedback

Constrained Cross-Entropy Method for Safe Reinforcement Learning

Min Wen, Ufuk Topcu

Neural Information Processing SystemsFeb-12-2026, 14:42:21 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, constraint, international conference, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Ganguly, Sourav, Ghosh, Arnob, Panaganti, Kishan, Wierman, Adam

arXiv.org Artificial IntelligenceDec-3-2025

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $ε$ sub-optimality and feasible policy after $O(ε^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($γ$) and by at least 6x for larger value of $γ$.

constraint, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2505.19238

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.68)

Industry: Water & Waste Management > Solid Waste Management (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

A Lyapunov-based Approach to Safe Reinforcement Learning

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

Neural Information Processing SystemsNov-20-2025, 16:23:30 GMT

In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints. In particular, besides optimizing performance, it is crucial to guarantee the safety of an agent during training as well as deployment (e.g., a robot should avoid taking actions - exploratory or not - which irrevocably harm its hardware). To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision processes (CMDPs), an extension of the standard Markov decision processes (MDPs) augmented with constraints on expected cumulative costs.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Constrained Cross-Entropy Method for Safe Reinforcement Learning

Min Wen, Ufuk Topcu

Neural Information Processing SystemsNov-20-2025, 15:38:45 GMT

We study a safe reinforcement learning problem in which the constraints are defined as the expected cost over finite-length trajectories.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

aa3e67220ca4cd50010165c950fc8056-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 04:17:03 GMT

constraint, unsafe state, violation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
Asia > China (0.04)

Genre: Workflow (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.67)

Add feedback

aa3e67220ca4cd50010165c950fc8056-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 04:16:59 GMT

constraint, exploration, violation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

NeurIPS20_SafeCL

Matteo Turchetta

Neural Information Processing SystemsAug-15-2025, 02:03:28 GMT

curriculum policy, intervention, student, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Redmond (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Reviews: Constrained Cross-Entropy Method for Safe Reinforcement Learning

Neural Information Processing SystemsMay-26-2025, 05:38:51 GMT

This paper studies constrained optimal control, where the goal is to produce a policy that maximizes an objective function subject to a constraint. The authors provide great motivation for this setting, explaining why the constraint cannot simply be included as a large negative reward. They detail challenges in solving this problem, especially if the initial policy does not satisfy the constraint. They also note a clever extension of their method, where they use the constraint to define the objective, by setting the constraint to indicate whether the task is solved. Their algorithm builds upon CEM: at each iteration, if there are no feasible policies, they maximize the constraint function for the policies with the largest objective; otherwise, they maximize the objective function for feasible policies.

constrained cross-entropy method, constraint, safe reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Filters

Collaborating Authors

feasible policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

Constrained Cross-Entropy Method for Safe Reinforcement Learning

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

A Lyapunov-based Approach to Safe Reinforcement Learning

Constrained Cross-Entropy Method for Safe Reinforcement Learning

aa3e67220ca4cd50010165c950fc8056-Supplemental-Conference.pdf

aa3e67220ca4cd50010165c950fc8056-Paper-Conference.pdf

NeurIPS20_SafeCL

Reviews: Constrained Cross-Entropy Method for Safe Reinforcement Learning